aei_no_std <- read.csv("/Volumes/RachelExternal/Thesis/Data_upload_for_CL/AEI_NoStd.csv") #some data
source("/Volumes/RachelExternal/Thesis/Thesis/Thesis_Functions.R") #some functions

Histograms

We’ve got a lot of things here to log, scale or center.

What do I mean by scaling or centering?

Scaling: Shifting the range of the predictor between [0,1] using the formula: \(\frac{y_i}{max(y)}\)

Centering: Centering the mean of the predictor on 0 or 1 using the formula: \(\frac{y_i-mean(y)}{sd(y)}\)

Why must I do this?

Well, a couple of reasons. First and foremost it makes the specification of priors easier, as the distribution of the parameter already centered around the mean and most of the data points are contained within one standard deviation to each side. Another reason is that it makes the interpretation of the coeficients a bit easier, as you can clearly tell which have positive or negative effects.

Lets plot what these look like with out being transformed. These graphs are also interactive, feel free to click around.

Main Predictors

Some things of note here: I will be using income instead of total GDP, median Humidity and PET instead of Average, and Humidity with the Inf/NaN values replaced.

A lot of things here are skewed left, which is to be expected as majority of countries are smaller rather than bigger in all aspects. There is little noticeable difference between the regions in many of the predictor variables. Income has trends as you would expect, with Europe having higher incomes and Sub Saharan Africa having lower, with some of the high outliers being from North Africa and the Middle East.There are bigger regional differences in Humidity and PET (remember these two are related \(Humidity = Precip/PET\)). Precip looks similar to Humidity (again, similarities were expected). Ruggedness is tough to find regional trends visually.

Crop Fractions

What about crop fractions?

Standardization

A lot of these predictors I assume accumulate exponentially. Population, Income and total GDP are obvious ones. Some others seem to be exponential as well, given the distrobution of their histograms. Precipitation seems to accumulate exponentially, as well as most of the crop fractions.

  • area_km -
  • population - Log and Center
  • income - Log and Center
  • GDP - Log and Center
  • Median Humidity - Center
  • Median PET - Center
  • Precip - Log and Center
  • Ruggedness - Scale
aei_std <-
  aei_no_std %>% 
  select(-c(20, 21,24,28)) %>% 
  mutate(across(c(17:19, 23), log)) %>% 
  mutate(across(c(8, 17:19, 20:23, 25:51), scale, scale = TRUE)) %>% 
  mutate(across(c(24), normalized))

summary(aei_std)
##        X              ISO                 year        country         
##  Min.   :   1.0   Length:3003        Min.   :1910   Length:3003       
##  1st Qu.: 751.5   Class :character   1st Qu.:1940   Class :character  
##  Median :1502.0   Mode  :character   Median :1970   Mode  :character  
##  Mean   :1502.0                      Mean   :1964                     
##  3rd Qu.:2252.5                      3rd Qu.:1990                     
##  Max.   :3003.0                      Max.   :2005                     
##                                                                       
##        ID            aei_ha           yearcount         area_km.V1    
##  Min.   :  1.0   Min.   :       0   Min.   : 0.00   Min.   :-0.34652  
##  1st Qu.: 62.0   1st Qu.:       0   1st Qu.:30.00   1st Qu.:-0.33996  
##  Median :125.0   Median :   12852   Median :60.00   Median :-0.29198  
##  Mean   :125.5   Mean   :  804385   Mean   :54.23   Mean   : 0.00000  
##  3rd Qu.:187.0   3rd Qu.:  200700   3rd Qu.:80.00   3rd Qu.:-0.09049  
##  Max.   :400.0   Max.   :64646000   Max.   :95.00   Max.   : 8.91977  
##                                                     NA's   :260       
##     irrperc            irrfrac        four_regions       eight_regions     
##  Min.   : 0.00000   Min.   :0.00000   Length:3003        Length:3003       
##  1st Qu.: 0.00586   1st Qu.:0.00006   Class :character   Class :character  
##  Median : 0.21376   Median :0.00214   Mode  :character   Mode  :character  
##  Mean   : 1.67096   Mean   :0.01671                                        
##  3rd Qu.: 1.53916   3rd Qu.:0.01539                                        
##  Max.   :37.41152   Max.   :0.37412                                        
##  NA's   :260        NA's   :260                                            
##  six_regions           Latitude        Longitude       World.bank.region 
##  Length:3003        Min.   :-42.00   Min.   :-175.00   Length:3003       
##  Class :character   1st Qu.:  4.00   1st Qu.:  -9.50   Class :character  
##  Mode  :character   Median : 17.05   Median :  20.00   Mode  :character  
##                     Mean   : 18.81   Mean   :  20.06                     
##                     3rd Qu.: 39.75   3rd Qu.:  48.00                     
##                     Max.   : 65.00   Max.   : 179.14                     
##                     NA's   :650      NA's   :650                         
##    population.V1       income.V1         GDPtot.V1        medHumid.V1    
##  Min.   :-3.0977   Min.   :-2.0902   Min.   :-3.2642   Min.   :-1.23445  
##  1st Qu.:-0.6125   1st Qu.:-0.7890   1st Qu.:-0.6816   1st Qu.:-0.78071  
##  Median : 0.1513   Median :-0.1326   Median : 0.0615   Median :-0.06354  
##  Mean   : 0.0000   Mean   : 0.0000   Mean   : 0.0000   Mean   : 0.00000  
##  3rd Qu.: 0.6386   3rd Qu.: 0.6983   3rd Qu.: 0.6851   3rd Qu.: 0.54192  
##  Max.   : 2.7152   Max.   : 3.1518   Max.   : 2.7272   Max.   : 8.91519  
##  NA's   :650       NA's   :663       NA's   :663       NA's   :143       
##     medHumid2.V1        medPET.V1        cubM_precip.V1       rugged       
##  Min.   :-1.23054   Min.   :-2.75410   Min.   :-4.04948   Min.   :0.00000  
##  1st Qu.:-0.78834   1st Qu.:-0.67067   1st Qu.:-0.67945   1st Qu.:0.04519  
##  Median :-0.06774   Median : 0.26460   Median : 0.02136   Median :0.12290  
##  Mean   : 0.00000   Mean   : 0.00000   Mean   : 0.00000   Mean   :0.17603  
##  3rd Qu.: 0.52698   3rd Qu.: 0.69194   3rd Qu.: 0.69510   3rd Qu.:0.25451  
##  Max.   : 8.66093   Max.   : 2.10413   Max.   : 2.65292   Max.   :1.00000  
##  NA's   :143        NA's   :143        NA's   :143        NA's   :52       
##  Temperate_cereals.V1      Rice.V1            Maize.V1      Tropical_cereals.V1
##  Min.   :-0.71529     Min.   :-0.52190   Min.   :-0.72777   Min.   :-0.34949   
##  1st Qu.:-0.61383     1st Qu.:-0.52190   1st Qu.:-0.64338   1st Qu.:-0.34040   
##  Median :-0.17326     Median :-0.41011   Median :-0.16466   Median :-0.20449   
##  Mean   : 0.00000     Mean   : 0.00000   Mean   : 0.00000   Mean   : 0.00000   
##  3rd Qu.: 0.18603     3rd Qu.: 0.15985   3rd Qu.: 0.20335   3rd Qu.: 0.00788   
##  Max.   :12.80401     Max.   :13.21499   Max.   :16.02787   Max.   :26.13316   
##  NA's   :143          NA's   :143        NA's   :143        NA's   :143        
##      Pulses.V1      Temperate_roots.V1 Tropical_roots.V1     Sunflower.V1   
##  Min.   :-0.48586   Min.   :-0.28270   Min.   :-0.43966   Min.   :-0.49024  
##  1st Qu.:-0.43492   1st Qu.:-0.28270   1st Qu.:-0.43966   1st Qu.:-0.49024  
##  Median :-0.18152   Median :-0.24262   Median :-0.25750   Median :-0.29274  
##  Mean   : 0.00000   Mean   : 0.00000   Mean   : 0.00000   Mean   : 0.00000  
##  3rd Qu.: 0.08921   3rd Qu.:-0.00056   3rd Qu.: 0.08321   3rd Qu.: 0.08903  
##  Max.   :18.45318   Max.   :30.30769   Max.   :25.09540   Max.   :18.13688  
##  NA's   :143        NA's   :143        NA's   :143        NA's   :143       
##      Soybean.V1       Groundnuts.V1       Rapeseed.V1        Sugarcane.V1   
##  Min.   :-0.44055   Min.   :-0.42903   Min.   :-0.32214   Min.   :-0.16489  
##  1st Qu.:-0.44055   1st Qu.:-0.42903   1st Qu.:-0.32214   1st Qu.:-0.16489  
##  Median :-0.33527   Median :-0.24285   Median :-0.20053   Median :-0.12973  
##  Mean   : 0.00000   Mean   : 0.00000   Mean   : 0.00000   Mean   : 0.00000  
##  3rd Qu.: 0.07736   3rd Qu.: 0.09089   3rd Qu.: 0.00380   3rd Qu.:-0.02986  
##  Max.   :18.83110   Max.   :28.73295   Max.   :24.24173   Max.   :37.48230  
##  NA's   :143        NA's   :143        NA's   :143        NA's   :143       
##      Others.V1      Managed_Grasslands.V1 Temperate_cereals.1.V1
##  Min.   :-0.86416   Min.   :-1.14803      Min.   :-0.24269      
##  1st Qu.:-0.61199   1st Qu.:-0.48318      1st Qu.:-0.24269      
##  Median :-0.10841   Median :-0.13424      Median :-0.19144      
##  Mean   : 0.00000   Mean   : 0.00000      Mean   : 0.00000      
##  3rd Qu.: 0.23593   3rd Qu.: 0.27012      3rd Qu.:-0.03013      
##  Max.   :11.81541   Max.   : 9.41777      Max.   :23.46723      
##  NA's   :143        NA's   :143           NA's   :143           
##      Rice.1.V1          Maize.1.V1     Tropical_cereals.1.V1    Pulses.1.V1    
##  Min.   :-0.33405   Min.   :-0.37124   Min.   :-0.24242      Min.   :-0.21988  
##  1st Qu.:-0.33405   1st Qu.:-0.37124   1st Qu.:-0.24242      1st Qu.:-0.21988  
##  Median :-0.29997   Median :-0.22663   Median :-0.24242      Median :-0.19206  
##  Mean   : 0.00000   Mean   : 0.00000   Mean   : 0.00000      Mean   : 0.00000  
##  3rd Qu.: 0.00570   3rd Qu.: 0.03389   3rd Qu.:-0.07847      3rd Qu.:-0.05356  
##  Max.   :29.12739   Max.   :20.65714   Max.   :19.64764      Max.   :26.12341  
##  NA's   :143        NA's   :143        NA's   :143           NA's   :143       
##  Temperate_roots.1.V1   Sunflower.1.V1      Soybean.1.V1     Groundnuts.1.V1  
##  Min.   :-0.26220     Min.   :-0.14977   Min.   :-0.23353   Min.   :-0.25187  
##  1st Qu.:-0.26220     1st Qu.:-0.14977   1st Qu.:-0.23353   1st Qu.:-0.25187  
##  Median :-0.26220     Median :-0.14977   Median :-0.23353   Median :-0.25187  
##  Mean   : 0.00000     Mean   : 0.00000   Mean   : 0.00000   Mean   : 0.00000  
##  3rd Qu.:-0.07223     3rd Qu.:-0.06512   3rd Qu.:-0.04155   3rd Qu.:-0.04506  
##  Max.   :19.86639     Max.   :30.77447   Max.   :37.37319   Max.   :37.77301  
##  NA's   :143          NA's   :143        NA's   :143        NA's   :143       
##    Rapeseed.1.V1      Sugarcane.1.V1      Others.1.V1    
##  Min.   :-0.17788   Min.   :-0.29669   Min.   :-0.31794  
##  1st Qu.:-0.17788   1st Qu.:-0.29669   1st Qu.:-0.30152  
##  Median :-0.17788   Median :-0.26895   Median :-0.15380  
##  Mean   : 0.00000   Mean   : 0.00000   Mean   : 0.00000  
##  3rd Qu.:-0.15152   3rd Qu.:-0.05969   3rd Qu.: 0.03691  
##  Max.   :26.12795   Max.   :16.30080   Max.   :39.03201  
##  NA's   :143        NA's   :143        NA's   :143       
##  Managed_Grasslands.1.V1
##  Min.   :-0.28247       
##  1st Qu.:-0.28247       
##  Median :-0.21928       
##  Mean   : 0.00000       
##  3rd Qu.:-0.03580       
##  Max.   :25.22847       
##  NA's   :143